CLAIMS 



Having thus described our invention, what we claim as new, and desire to secure by 
Letters Patent is: 

1 LA multistage microprocessor pipeline structure for executing 

2 processing instructions comprising: 

3 an instruction cache, an instruction buffer, a decoder, a register file, an 

4 arithmetic logic unit, and a cache memory, wherein to start execution of an instruction, 

5 the instruction is fetched from the instruction cache and loaded into the instruction 

6 buffer, and the instruction is then decoded for the operation by the decoder, and 

7 corresponding register values for execution of the instruction are read from the register 

8 file and are input to the arithmetic logic unit which executes the instruction; 

9 width determination logic receives outputs from the decoder and 

10 determines a minimum effective processing operation width for executing each 

1 1 processing instruction and propagates width control data along with data for execution 

12 of the instruction through the pipeline structure for executing the processing 

13 instruction; 

14 the microprocessor pipeline structure comprises a plurality of reduced 

1 5 bit width slices for execution of the instruction, wherein each slice comprises a 

1 6 reduced bit width portion of the register file, a reduced bit width portion of the 

1 7 arithmetic logic unit, and a reduced bit width portion of the cache memory, with a data 

1 8 carry operation proceeding from a lesser significant slice to a more significant slice, 

1 9 and the slices all operate in parallel when a full bit width processing operation is 

20 executed, or only a minimum required numbers of slice is enabled if the width of the 

21 processing operation is determined to be narrower than a full bit width processing 

22 operation, and different slices are enabled and process data on a cycle-by-cycle basis. 
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2. The multistage microprocessor pipeline structure of claim 1, 
wherein the width determination logic uses data about the length of operands stored in 
a register file tags module that stores value bit information about each operand in the 
register file, including the sign and width of each operand, and one or more leading 
bits of one or more bytes of each operand, which are examined for data overflow from 
a lesser significant slice to a more significant slice. 

3. The multistage microprocessor pipeline structure of claim 2, 
wherein the value bit information includes a sign of an operand in one bit, a register 
data width in bytes of the operand value in two bits, and one or more leading bits of 
one or more of the most significant bytes of the operand. 

4. The multistage microprocessor pipeline structure of claim 2, 
wherein the output of the decoder indicates two source registers and one destination 
register, and an instruction operation code, and the register file tags module and the 
instruction operation code are used to determine the number of slices required for 
executing the corresponding processing instruction, and those number of slices are 
enabled in subsequent cycles in the pipeline structure. 

5. The multistage microprocessor pipeline structure of claim 1, 
wherein a cache tag file stores addresses of the cache memory to write to and read 
from, and a width tag file stores the width of data stored in each memory address. 

6. The multistage microprocessor pipeline structure of claim 1, 
wherein the width determination logic outputs the width control bits to enable data 
flow and computation in the slices. 
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7. The multistage microprocessor pipeline structure of claim 1, 
wherein enabling and disabling of each slice is accomplished by clock gating, where 
during enabling, clock signals allow data to proceed into and through a slice, and 
during disabling, clock signals block the flow of data into and through a slice. 

8. The multistage microprocessor pipeline structure of claim 2, 
wherein the width determination logic determines the likelihood of a data overflow 
being generated from a narrow slice operation by examining one or more leading bits 
of the operands which are stored in the register file tags module and generates one of 
three determinations: 

no data overflow is guaranteed, and the effective operation width is 
determined by the width of the narrow operands; 

data overflow is guaranteed, and the effective operation width must be 
one byte larger than the width of the narrow operands; 

data overflow is possible but not certain, wherein a carry into the bits 
examined is propagated as a carry out. 

9. The multistage microprocessor pipeline structure of claim 1, 
wherein following execution and completion of a processing operation by the 
arithmetic logic unit, the width of the value of the processing operation result is 
determined, after which the width determination logic determines value bit 
information for the processing operation result by combining its sign bit, value width 
and one or more leading bits for its one or more leading bytes, which is then written to 
a destination register in the register file. 

10. A method for reducing logic activity in the execution of an 
operation in a processor comprising the steps of: 

selecting at least one operand associated with said operation, 
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4 looking up a width and a value of selected bits of said at least one 

5 operand, 

6 determining a prediction of arithmetic overflow, based upon the width 

7 and the value of said selected bits of said at least one operand, 

8 determining an effective width of said operation based upon the width 

9 of said at least one operand, a function specified by said operation, and said prediction 

1 0 of arithmetic overflow, 

1 1 enabling the width of the resources in said processor corresponding to 

12 said effective width of said operation for executing said operation, 

13 executing said operation, 

14 determining the width of the result of said operation based upon the 

1 5 step of executing. 

1 1 L The method of claim 10, including saving the width of the result of 

2 said operation, and saving said result of said operation. 

1 12. The method of claim 10, wherein said step of looking up a width 

2 and a value of selected bits includes dedicated hardware for holding and retrieving 

3 said width and said value of selected bits. 

1 13. The method of claim 1 0, wherein the processor includes a register 

2 file, an arithmetic unit, a memory path, and a cache memory, and the register file, the 

3 arithmetic unit, and the cache memory are divided into a plurality of slices, each of 

4 which is of a reduced bit granularity, and the bits in all of the slice form a full width 

5 word in the processor. 

1 14. The method of claim 13, wherein at least one slice is of 8 bit 

2 granularity. 
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15. The method of claim 13, wherein at least one slice is of 16 bit 

granularity. 

16. The method of claim 13, wherein said step of enabling includes 
logic to enable a required number of slices to execute the operation. 

17. A processor comprising; 

a plurality of slices, each of which is a portion of a full width word of 
the processor, wherein each slice comprises a portion of a register file, a portion of 
functional units, a portion of a memory path, a portion of a cache memory, and a 
portion of other resources required to perform operations in the processor, 

logic to save and retrieve a width and selected bits of operands used to 
perform an operation in the processor, 

logic to determine a prediction of arithmetic overflow when performing 
the operation, based upon the width and the selected bits of the operands used to 
perform the operation, 

logic to determine a number of slices required to perform the operation 
based upon the width of one or more operands, the functionality of the operation, and 
the prediction of arithmetic overflow, 

logic to activate the slices required to perform the operation. 

18. The processor of claim 1 7, including logic to determine the width 
of the result of the operation, 

circuitry to store said width and selected bits of said result, and 
circuitry to store said result. 
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